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^ * Abstract. It is demonstrated how to represent asymptotically mean stationary 

(AMS) random sources with values in standard spaces as mixtures of ergodic 
AMS sources. This an extension of the well known decomposition of stationary 
sources which has facilitated the generalization of prominent source coding the- 
orems to arbitrary, not necessarily ergodic, stationary sources. Asymptotic mean 
(/3 \ stationarity generalizes the definition of stationarity and covers a much larger va- 

O ■ riety of real- world examples of random sources of practical interest. It is sketched 

how to obtain source coding and related theorems for arbitrary, not necessarily er- 
godic, AMS sources, based on the presented ergodic decomposition. 
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1 Introduction 
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^^ . The main purpose of this paper is to demonstrate how to decompose asymptotically 

mean stationary (AMS) random sources into ergodic AMS sources. The issue was 
. brought up in [10], as it is involved in a variety of aspects of substantial interest to 

rS I information theory. To the best of our knowledge, it had remained unsolved since then. 

S . The ergodic decomposition of AMS sources can be viewed as an extension of the 

ergodic decomposition of stationary sources which states that a stationary source can be 
decomposed into ergodic components or, in other words, that it is a mixture of stationary 
and ergodic sources. This was originally discussed in more abstract measure theoretic 
settings (see the subsequent remark 1). 

The first result in information theory that builds on the idea of decomposing a source 
into ergodic components was obtained by Jacobs in 1963. He proved that the entropy 
rate of a stationary source is the average of the rates of its ergodic components [17]. In 
1974, the ergodic decomposition of stationary sources was rigorously introduced to the 
community by Gray and Davisson [7] who also provided an intuitive proof for sources 
with values in a discrete alphabet. This turned out to be a striking success as prominent 
theorems from source coding theory and related fields could be extended to arbitrary, 
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not necessarily ergodic, stationary sources [8,19,23,27,22,5] (see the references therein 
as well as [11] for a complete list). 

In general, these results underscore that ergodic and information theory have tradi- 
tionally been sources of mutual inspiration. 



Remark 1. The first variant of an ergodic decomposition of stationary sources (with 
values in certain topological spaces) was elaborated in a seminal paper by von Neu- 
mann [31]. Subsequently, Kryloff and Bogoliouboff [3] obtained the result for compact 
metric spaces, and it was further extended by Halmos [13,14] to normal spaces. In par- 
allel, Rokhlin [29] proved the decomposition theorem for Lebesgue spaces, which still 
can be considered as one of the most general results. Oxtoby [24] further clarified the 
situation by demonstrating that Kryloff 's and Bogoliouboff 's results can be obtained as 
corollaries of Riesz' representation theorem. In ergodic theory, the corresponding idea 
is now standard [26,32]. 

Asymptotic mean stationarity was first introduced in 1952 by Dowker [4] and fur- 
ther studied by Rechard [28], but became an area of active research only in the early 
1980s, thanks to a fundamental paper of Gray and Kieffer [9]. Asymptotic mean sta- 
tionarity is a property that applies for a large variety of natural examples of sources of 
practical interest [9]. Reasons are: 

1. Asymptotic mean stationarity is stable under conditioning (see [21], p. 33) whereas 
stationarity is not. 

2. To possess ergodic properties w.r.t. bounded measurements is equivalent to asymp- 
totic mean stationarity [4,9]. Note that Birkhoff's theorem (e.g. [21]) states that 
stationarity is sufficient to possess ergodic properties. 

3. The Shannon-McMillan-Breiman (SMB) theorem was iteratively extended to fi- 
nally hold for AMS discrete random sources in 1980 [9]. 

Note that an alternative, elegant proof of the SMB theorem can be achieved by em- 
ploying the ergodic decomposition of stationary sources [1]. The second point gives 
evidence of the practical relevance of AMS sources, as to possess ergodic properties 
is a necessity in a wide range of real-world applications of stochastic processes. For 
example, asymptotic mean stationarity is implicitly assumed when relative frequencies 
along sequences emitted by a real-world process are to converge. See also [20,6] for 
expositions of large classes of AMS processes of practical interest. The validity of the 
SMB theorem is a further theoretical clue to the relevance of AMS sources in informa- 
tion theory. 

The benefits of an ergodic decomposition of AMS sources are, on one hand, to ar- 
range the theory of AMS sources and, on the other hand, to facilitate follow-up results 
in source coding theory and related fields (see the discussion section 7 for some imme- 
diate consequences). In [10], one can find a concise proof of the ergodic decomposition 
of stationary sources as well as the ergodic decomposition of two-sided AMS sources, 
both with values in standard spaces. The case of two-sided AMS sources, however, is a 
straightforward reduction to the stationary case which does not apply for arbitrary AMS 
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sources. As the result for arbitrary AMS sources would have been highly desirable, it 
was listed as an open question in the discussion section of [10]. 

The main purpose of this paper is to provide a proof of the ergodic decomposition of 
arbitrary (two-sided and one-sided) AMS sources with values in standard spaces which 
cover discrete-valued and all natural examples of topological spaces. 

The paper is organized as follows. In section 2 we collect basic notations and state 
the two main results. The first one is the ergodic decomposition itself and the second 
one is an essential lemma that may be interesting in its own right. In section 3, we 
present basic definitions of probability and measure theory as well as a classical er- 
godic theorem {Krengel's stochastic ergodic theorem) required for our purposes. The 
statement of Krengel's theorem is intuitively easy to grasp and can be understood by 
means of basic definitions from probability theory only. In section 4 we give a proof 
of lemma 1 . Both the statement and the proof of lemma 1 are crucial for the proof 
of the decomposition. In section 5, we list relevant basic properties of standard spaces 
(subsection 5.1) and regular conditional probabilities and conditional expectations (sub- 
section 5.2). Finally, in section 6, we present the proof of the ergodic decomposition. 
For organizational convenience, we have subdivided it into three steps and collected the 
merely technical passages into lemmata which have been deferred to the appendices A 
and B. We conclude by outlining immediate consequences of our result and pointing 
out potential applications in source coding theory, in the discussion section 7. 

2 Basic Notations and Statement of Results 

Let {n, B) be a measurable space and T : il -^ il a measurable function. In this setting 
(see [26,12]), a probability measure P is called stationary (relative to T), if 

P(B) = P{T-^B) 

for all S G S. It is called asymptotically mean stationary (AMS) (relative to T), if there 
is a measure P on (J7, B) such that 

_. n — 1 

\fBeB: lim -V P(r-*B) = F(B). (1) 

n — >oo Ji — ^ 
1=0 

Clearly, the measure P is stationary and it is therefore called the stationary mean of P. 
An event I ^ B is called invariant (relative to T), if T^^I = /. The set of invariant 
events is a sub-cr-algebra of B which we will denote by T. A probability measure P 
on {f2, B) is said to be ergodic (relative to T), if P{I) G {0, 1} for any such invariant 
I € T. Note that an AMS system is ergodic if and only if its stationary mean is. 

In order to apply this theory to (A-valued) random sources, that is, discrete-time 
stochastic processes with values in a standard space A (for a definition of standard 
space see subsection 5.1), one sets 

n = A' ^<^A 
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where / G {N, Z}. That is, f2 is the space of one-sided (/ = N) or two-sided (/ = Z) 
^-valued sequences. B then is set to be the cr-algebra generated by the cyHnder sets of 
sequences. A random source is given by a probabihty measure P on (i7, B). Further, 
T : n ^ n is defined to be the left shift operator, i.e. 



{Tx)r 



l^n+l 



for X = {xo,xi, ...,Xn, ■■■) e i7 (one-sided case) or x = {...,x-i,xo,Xi, ...) £ f2 
(two-sided case). 

The main contribution of this paper is to give a proof of the following theorem. 

Theorem 1. Let P be a probability measure on a standard space (i7, B) which is AMS 
relative to the measurable T : f2 —> f2. Then there is a T-invariant set E £ X with 
P{E) = 1 such that for each lu £ E there is an ergodic AMS probability measure P^ 
and the following properties apply: 

(a) 

WBeB: PUB)^PtUB)- 

(b) 

VBeB: P{B) = I P^{B) dP{u;). 

(c) Iffe Li{P), then also uj i-^ J f dP^ E Li{P) and 

fdP{oj)^ JiJ fdP^)dP{uj). 

Replacing AMS by stationary yields the aforementioned and well-known theorem 
of the ergodic decomposition of stationary random sources (e. g. [ 10], th. 2.5). 

The following lemma is a key observation for the proof of theorem 1 and may be 
interesting in its own right. It states that the convergence involved in the definition of 
AMS measures is uniform over the elements of B. This may seem intuitively surprising, 
as the underlying measurable space does not even have to be standard. 

Lemma 1. Let P be an AMS measure on {f2, B) relative to T. Then 

_. n— 1 

sup \-y^ P{T-'B) - P{B)\ — > 0. 
In other words, the convergence of(l) is uniform over the events B £ B. 

3 Preliminaries 

3.1 Convergence of Measures 

Definition 1. Let {Pn)neN be a sequence of probability measures on a measurable 
space {Q, B). 



The ergodic decomposition of asymptotically mean stationary random sources 5 

- We say that the Pn converge strongly to a probability measure P if the sequences 
(P„(i?))„gN converge to P{B) for all B Cz B. 

- If this convergence happens to be uniform in B E B we say that the Pn converge 
Skorokhod weakly to P. 

See [16] for history and detailed characterisations of these definitions. Obviously 
Skorokhod weak convergence implies strong convergence. Seen from this perspective, 
lemma 1 states that the measures P„ = 1/n Y^"Zq P o T~*, where P is an AMS mea- 
sure and P o T~*{B) := P{T~*B) , do not only converge strongly (which they do by 
definition), but also Skorokhod weakly to the stationary mean P. 

A helpful characterization of Skorokhod weak convergence is the following the- 
orem. Therefore we recall that a probability measure Q is said to dominate another 
probabiHty measure P (written Q » P) if Q{B) = implies P{B) = for all 
B E B. The theorem of Radon-Nikodym (e.g. [15]) states that in case of Q >> P 
there is a measurable function / : i? ^ M, called Radon-Nikodym derivative or simply 
density, written f = ^, such that 



dQ' 



P{B)= f fdQ 

JB 



for all B e B.lt holds that P{f = g) = 1 (hence Q{f = g) = 1) for two densities 



1,9^''' 



dQ- 

As usual. 



L,{Q):=L,iQ,B,Q) 



denotes the (linear) space of Q-integrable functions on (il, B) modulo the subspace of 
functions that are null almost everywhere. For technical convenience, we will some- 
times identify elements of / G Li{Q) with their representatives / : i? ^ M. As a 
consequence we have that f = g in Li{Q) if and only if Q(/ = g) = 1 for their rep- 
resentatives. That is, equality is in an almost-everywhere sense for the representatives. 
Therefore, in Li (Q), a density is unique. Furthermore, Li (Q) can be equipped with the 
norm 

ll/lli:= / \f\dQ. 
See standard textbooks (e.g. [15]) for details. 

In this language, Skorokhod weak convergence has a useful characterisation. 

Theorem 2 ([16]). Let {Pn)n&i, P be probability measures. Then the following state- 
ments are equivalent: 

(i) The P„ converge Skorokhod weakly to P. 

(ii) There is a probability measure Q, which dominates P and all of the P„ such that 
the densities /„ := ^^ converge stochastically to the density J := ^, that is 

VeeM+i Q({a>: |/„(cj)-/(cj)| >e}) -^ 0. 
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(Hi) There is a probability measure Q, which dominates P and all of the P„ such that 
the densities /„ := -^ converge in mean (in Li(Q)j to the density J :— ^, that 



IS 



\fn-f\dQ -^ 0. 



Proof. See [16], pp. 6-7. o 

3.2 Krengel's theorem 

In few words, the stochastic ergodic theorem of Krengel states that the averages of den- 
sities which are obtained by iterative applications of a positive contraction in Li{Q) 
converge stochastically to a density that is invariant with respect to the positive contrac- 
tion. 

To be more precise, let (i?, B, P) be a measure space and U a positive contraction on 
Li(l?,S,P), thatis, t// > for / > (positivity) and ||C//||i < ||/||i (contraction). 
Then fl can be decomposed into two disjoint subsets (uniquely determined up to P- 
nuUsets) 

where C is the maximal support of a /o G Li{f2, B, P) with Ufa = fo-ln other words, 
for all / G Li with Uf — /, we have / = on L* and there is a /o G Li such that both 
Ufo = /o and /o > on (7 (see [21], p. 141 ff. for details). Krengel's theorem then 
reads as follows. 

Theorem 3 (Stochastic ergodic theorem; Krengel). If U is a positive contraction on 
Li of a a-finite measure space (J7, B, Q) (e.g. a probability space, the definition of 
a a-finite measure space [15] is not further needed here) then, for any f G Li, the 
averages 

_. n — 1 

n ^-^ 
t=o 

converge stochastically to a U-invariant f. Moreover, on C we have Li-convergence, 
whereas on D the Anf converge stochastically to 0. If f > then 

/ = liminfA„/ inLiiQ). (2) 

n— »-oo 

Proof [21], p. 143. o 

3.3 Finite Signed Measures 

Let (/?, B) be a measurable space. A finite signed measure is a tr-additive, but not 
necessarily positive, finite set function on B. The theorem of the Jordan decomposition 
([15], p. 120 ff.) states that P = P^ ~ P^ for measures P-f, P_. These measures are 
uniquely determined insofar as if P — Pi — Pj for measures Pi , P2 then there is a 
measure 5 such that 

Pi=P++5 and P2 = P. + 5. (3) 
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P+, P_ and |P| := P+ + P- are c&Wtd positive, negative and total variation of P. We 
further define 

\\p\\Tv-=\p\m. 

By "eventwise" addition and scalar multiplication the set of finite signed measures can 
be made a normed vector space equipped with the norm of total variation 1 1 . 1 1 tv , writ- 
ten {V, IMItv) or simply V. The following observation about signed measures and 
measurable functions is crucial for this work. 

Lemma 2. Let P be a finite signed measure on (J?, B) and T : fl ^ fl a measurable 
function. Then P o T~^ is a finite signed measure for which 

\PoT-^\{B) < \P\{T-^B) 

for all B G B. In particular, \\P oT^^\\tv < II-P||tv- 

Proof. Note that P o T^^ = P+ o T~^ — P_ o T~^ is a decomposition into a difference 
of measures. Because of the uniqueness property of the Jordan decomposition (3), there 
is a measure i5 such that P+oT"i = {P oT-^)++ 6 and P-oP-^ = {PoT-^)-+5. 
Therefore \P o P-^B) = (P o T-^)+{B) + (P o T-^)_{B) < P+{T-^B) + 
P_(T-^B) = |P|(T-ip). B = /2 yields the last assertion, as P- 1/2 = 12. o 

We finally observe the following well known relationship between signed measures 
dominated by a measure Q and Li{Q). Therefore, as usual (e.g. [15]), we say that a 
finite, signed measure P is dominated by Q if its total variation is, that is, \P\ << Q. 
Note that the set Pq of finite, signed measures that are dominated by Q is a linear 
subspace of V. 

Lemma 3. Let Q be a measure on the measurable space (J?, B) and Vq be the linear 
space of the finite signed measures that are dominated by Q. If Pf (B) := /„ / dQ for 
f G Li{Q), then 

<P:(Pi(Q),||.||i)~-.(PQ,||.||Ty) 
f -^ Pf 

establishes an isometry of normed vector spaces. 

Proof. This is a consequence of the theorem of Radon-Nikodym, see [ 15], p. 128 ff. If 
P is a finite signed measure with \P\ « Q then also P+, P_ « Q. Define S'(P) := 
"5o" — 35" ^ ^liQ) ^s ^^^ difference of the densities of P_|., P_ relative to Q. Then tZ^ 
is just the inverse of ^. It is straightforward to check that I I/I 1 1 = ||^(/)||Ty- o 



4 Proof of Lemma 1 

We start by illustrating one of the core techniques of this work. Let (i?, B) be a measur- 
able space and {Qn)n£fi be a countable collection of probability measures on it. Then 
the set function defined by 

Q{B) := Y, 2-"-iQ„(P) yBeB (4) 

ra>0 
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is a probability measure which dominates all of the Qn [16]. 

Let now (]7, B, P, T) be such that P is an AMS measure relative to the measurable 
T : il -^ il. Define further P„ to be the measures given by 



p„(i?) = -Vp(r-*i3) (5) 

n ^ — ^ 



t=0 

for i? e ;B. As a consequence of (4), the set function Q defined by 



Q{B) := i(P(B) + J2 2-"-ip(T-"S)) (6) 

n>0 

for B G B is a probability measure which dominates all of the P o T~" as well as P. 
Hence it also dominates all of the Pn- Accordingly, we write 

for the respective densities. Lemma 1 can be obtained as a corollary of the following 
result. 

Lemma 4. Let P be an AMS probability measure on {f2, B) relative to T with station- 
ary mean P. Let P„, Q, /„ and f as defined by equations (5), (6) and (7). Then the fn 
converge stochastically to the density f := -jq. Moreover, 

/ = liminf/„ Q-a.e. (8) 



Proof. Let /i = ^- The road map of the proof is to construct a positive contraction U 
on Li{Q) such that 



/„ = -VC/*/i=:An/i. 
11 ^ — ^ 



n 
t=o 



As a consequence of Krengel's theorem we will obtain that the /„ converge stochasti- 
cally to a [/-invariant limit /*. In a final step we will show that indeed /* = / in Li (Q) 
(i.e. Q-a.e.), which completes the proof. 

Our endomorphism U on Li (Q) is induced by the measurable function T. Let / G 
Li{Q). We first recall that, by lemma 3, the set function 'P{f) given by 



<P{f){B) := / fdQ 

JB 



for B G B and / e Li{Q) is a finite, signed measure on (il, B) whose total variation 
|^(/)| is dominated by Q. 
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We would like to define 

which would be obviously linear However, (p^^ is only defined on Vq, that is, for finite 
signed measures that are dominated by Q. Therefore, we have to show that (I>{f)oT^^ S 
Vq which translates to demonstrating that |<?(/) o T^^\ « Q. This does not hold in 
general (see [21]). However, in the special case of the dominating Q chosen here, it can 
be proven. 

To see this lets such that |^(/)oT^i 1(5) > Oand we have to show that (5(i?) > 0. 
Because of lemma 2 

WifW-^B) >\<!>{J) oT-\B) > Q. 

As |^(/)| << Q, we obtain Q{T^^B) > 0. By definition of Q we thus either find 
an No ^n such that < P{T-^°{T-^B)) = P{T-^"-^B) or we have that < 
P{T^^B) = P{B) because of the stationarity of P. Both cases imply Q{B) > 
which we had to show. 

If / > then <?(/) is a measure. Hence also ^(/) o T^^ is a measure which in 

turn implies U f = ^ ^ jZ > 0. Hence U is positive. It is also a contraction with 

respect to the ii-norm 1 1 . 1 1 1, as, because of the lemmata 2 and 3, 



||[//||i = ||<?(/)oT-^||Tv<||<P(/)||Ty 



For /i = ^ being the density of P relative to Q we obtain 

d{PoT-") 



f/"/i 



dQ 



Hence the /„ := A„fi = l/"-X]"=o ^*/i '^^^ ^^^ densities of the P„ = ^ J2"=o ^ ° 
T~* relative to Q. An application of Krengel's theorem 3 then shows that the Anfi 
converge stochastically to a [/-invariant limit /* G Li{Q). Note that a positive U- 
invariant /just corresponds to a stationary measure. 

It remains to show that / = /* in Li{Q) or, equivalently, f ^ f* Q-a.e. for 
their representatives (see the discussions in subsection 3.1). Let D, as described in sub- 
section 3.2, be the complement of the maximal support of a [/-invariant g e Li{Q). 
We recall that stationary measures are identified with positive, [/-invariant elements of 
Li{Q). Therefore, / = fn i^ [/-invariant which yields 

g({/>0} n ^) = o 

which implies / = Q-a.e. on D. Due to Krengel's theorem, it holds that also f*—0 
Q-a.e. on D, and we obtain that 

7=0 = /* Q-a.e.onD. 
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In order to conclude that 

/ = /* - a.e. on C 

it remains to show that /g /* dQ = /^ / dQ for events BcC=f2\Das two 
integrable functions conincide almost everywhere if their integrals over arbitrary events 
coincide ([15]) with which we will have completed the proof. From KrengeFs theorem 
we know that, on C, we have Li -convergence of the /„: 



lim / \.fn-ndQ = 0. 



(9) 



Therefore, for B c C, 

[ rdQ t^ lim f fndQ= lim P„(B) ^"^ P{B) = / fdQ, 

where (**) follows from the asymptotic mean stationarity of P. We thus have com- 
pleted the proof of the main statement of the lemma. 

Finally, (8) is a direct consequence of (2) in KrengeFs theorem. o 

In sum, we have shown that there is a measure Q that dominates all of the P„ as 
well as P such that the densities of the P„ converge stochastically to the density of P. 
According to theorem 2, this is equivalent to Skhorokhod weak convergence. Hence we 
obtain lemma 1 as a corollary. 

5 Preliminaries II 

In this section we will first review a couple of additional definitions that are necessary 
for a proof of theorem 1 . In subsection 5. 1 we give the definition of a standard space. 
The beneficial properties of standard spaces become apparent in subsection 5.2, where 
we shortly review conditional probabilities and expectation. 

5.1 Standard spaces 

See [25], ch. 3 or [12] for thorough treatments of standard spaces. In the following, a 
field .7^ is a collection of subsets of a set Q that contains fi and is closed with respect to 
complements and finite unions. 

Definition 2. Afield T ona set Q is said to have the countable extension property if 

the following two conditions are met. 

1. T has a countable number of elements. 

2. Every nonnegative and finitely additive set function P on J- is continuous at 0, that 
is, for a sequence of elements Fn £ ^ with -F„+i C Fn such that n„i^„ = we 
have lini„_,oo P{Fn) = 0. 

Definition 3. A measurable space {f2, B) is called a standard space, if the a-algebra 
B is generated by afield T which has the countable extension property. 
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Remark 2. 

1 . Most of the prevalent examples of measurable spaces in practice are standard. For 
example, any measurable space which is generated by a complete, separable, metric 
space (i.e. a Polish space) is standard. Moreover, standard spaces can be character- 
ized as being isomorphic to subspaces [B, B O B) of Polish spaces (/?, B) where 
S e Z? is a measurable set (see [25], ch. 3). 

2. An alternative characterisation of standard spaces is that the cr-algebra B possesses 
a basis. See [18], app. 6, for a discussion. 

5.2 Conditional Probability and Expectation 

See [25], ch. 6 or [12] for a discussion of conditional probability and expectation. 

Definition 4. Let P be a probability measure on a measurable space {f2, B) and let 
Q G B be a sub-a-algebra ofB. A function 

s{.,.) -.bxh^r, 

is called a (version of the) conditional probability ofP given Q, if 

(CPl) 5{B, .) is Q -measurable for all B E B and 
(CP2) 



P{Br\G) = I 5{B,uj)dP{u 
Jg 

for all G eg,B eB. 



6{., .) is called a (version of the) regular conditional probability of P given Q, if in 
addition to (CPl) and (CP2), 

(RCP) (5(., Lo) is a probability measure on B for all lo E fi. 

We collect a couple of basic results about conditional probabilities. See [25] or [12] 
for details. 

1. Let 7, 6 be two versions of the conditional probability of P given Q. Then the Q- 
measurable functions 7(i?, .), 5{B, .) agree almost everywhere for any given B G 
B, that is, we have 

VBeB: P{{uj I 7(B, uj) = 6{B, lj)}) = 1. (10) 

2. Conditional probabiUties always exist. Existence of regular conditional probabili- 
ties is not assured for arbitrary measurable spaces. However, for standard spaces 
(i7, B) existence can be proven. 

3. Note that it cannot be shown for arbitrary measurable spaces that two versions S, 7 
agree almost every where /or all B G B, meaning that we do not have 

P{{uj \yB eB: j{B,uj) = S{B,uj)}) = 1. 

However, for standard spaces (]7, B) this beneficial property applies: 
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Lemma 5. Let (/2, B) be a measurable space such that B is generated by a countable 
field T. Let P be a probability measure on it and assume that the regular conditional 
probability of P given a sub-a-algebra Q exists. If 5,^ are two versions of it then the 
measures S{.,lu) and 7(., uj) agree on a set of measure one, that is, 

P{{uj \yB eB: -f{B,Lu) ^ 5{B,uj)}) = 1. 

We display the proof, as its (routine) arguments are needed in subsequent sections. 

Proof. Enumerate the elements of T and write Fk for element No. k. According to (10) 
we find for each fc S Naset Bk of P-measureoneon which(5(i^fe, .) and7(i^fc, .) agree. 
Hence, on i? := Cl/^Bk, which is an event of P-measure one, all of the S{Fk, .) and the 
jiFk, .) coincide. Thus the measures S{.,uj) and 7(., w) agree on a generating field of 
B for UJ G B. As a measure is uniquely determined by its values on a generating field 
([15]), we obtain that the measures S{.,uj) and 7(.,a-') agree on B, that is, P-almost 
everywhere. o 

We also give the definition of conditional expectations and point out their extra 
properties on standard spaces. 

Definition 5. Let (i7, B, P) be a probability space and f G Pi(P). Let Q C B be a 
sub-a-algebra. Ifh: f2 —^ Ris 

L Q-measurable and 

2. for allG e g it holds that ^^fdP = ^^h dP 

we say that h is a version of the conditional expectation of f given Q and write 

h{uj) = E{f\g){u). 

Conditional expectations always exist. In case of standard spaces they have an extra 
property which we rely on. See [25], ch. 6 for proofs of the following results. 

Theorem 4. Let (fl, B, P) be a probability space, Q a sub-a-algebra of B and f £ 
Li(P). Then there exists a version E{f\g) of the conditional expectation. In case of a 
standard space (i7, B) it holds that 



E{f\g){Lo)= J fix)dSpix,Lo) (11) 

where 5p is a version of the regular conditional probability of P given g. 

Corollary 1. Let (fi, B) be a standard space, P a probability measure on it and f G 
Li (P). Let g be a sub-a-algebra and dp the regular conditional probability of P given 
g. Then a; i— > / / d5p{.,uj) is g-measurable (hence also B-measurable) and 

f fdP= ( {( fd5p{.,Lo))dP (12) 

JG Jg J 

for all G eg. 
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6 Proof of Theorem 1 

We recall the notations of section 2 and that, according to the assumptions of theorem 1, 
P is a measure on a standard space (i7, B) that is AMS relative to the measurable 

T : n ^ n. 



6.1 Sketch of the Proof Strategy 

The core idea for proving the theorem is to define the measures P^^ as being induced by 
the regular conditional probability measures of P given the invariant events I. That is, 
we define 

yPeB: P^{B):^5p{B,Lo) (13) 

where, here and in the following, 6 refers to regular conditional probabilities given the 
invariant events I. Note that, for arbitrary probability measures P on (i7, B), 

5p{B,uj)=6p{B,Tuj), (14) 

as, otherwise, Sp{B, .)^^{y) would not be an invariant set for y := Sp{B, Tuj) which 
would be a contradiction to the Z-measurability of Sp{B, .). 

As a consequence of (14), we obtain property (a) of the theorem. Furthermore, (6) 
is the defining property (CP2) of a regular conditional probability (see Def. 4) and (c) 
is equation (12) from corollary 1 with G = i7. What remains to show is that, for w in 
an invariant set E of P-measure one, the P^^ are ergodic and AMS. 

We intend to do this by the following strategy. First, we recall that if, in theorem 1 , 
AMS is replaced by stationary, we obtain the well known result of the ergodic decom- 
position of stationary measures (see the introduction for a discussion). If one follows 
the lines of argumentation of its proof (see [10], th. 2.5) one sees that, on an invariant set 
of P-measure one, the P^^ are just the regular conditional probabilities of the stationary 
P. Applying the ergodic decomposition of stationary measures to the stationary mean 
P of P provides us with an invariant set E of P-measure 1 such that 

a; e P => P(j := Sp{.,uj) is stationary and ergodic. (15) 

We will show that, on an invariant set P C P of P-measure one, the P^^ converge 
Skorokhod weakly (hence strongly, see Def. 1) to the P^, which translates to that the 
Pcj are AMS and have stationary means P^. As an AMS measure is ergodic if its sta- 
tionary mean is ergodic, we will have completed the proof. 

Therefore, we will proceed according to the following steps: 

Step 1 We construct measures Q^^ that dominate P^ and all of the 



_. n — 1 

Pn,^:=-V(^^oT-"),n>0 (16) 

n ^ — ^ 



i^O 
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(note that P^ = Pi^^), which will provide us with densities 



dP - dP 

fn.uj -^ -77— and /^ := -— - (17) 

dQui dQu 



for all Lo. 



Step 2 We construct positive contractions U^ on Li{Q^) such that 

^^ rf(p^or-") _ d(p^or-"-i) 

•^t^ — 37; ^ 37; '■^°'' 



hence 

n-l 
I 

71 



_. n — 1 

/i... := - V C/^/i.. = /„,. (19) 

r? z — ^ 

We apply Krengel's theorem (th. 3) to obtain that the fn,u converge stochastically to a 
C/t^ -invariant f* as well as /* = liminf„^oo fn^u in Li(Q<^) 

Step 3 We show that, for oj in an invariant set E of P-measure one, 

f: = U inii(Qc.). 

This completes the proof, as this states that the P^^ converge Skorokhod weakly to the 
Pi^ in E, hence that the P^^ are ergodic and AMS for uj in the invariant set E of P- 
measure one. 



6.2 Step 1 

We recall definitions (5) and (6) of P„ and Q. We define Q^^ as the probabihty measures 
induced by the regular conditional probability of Q given the invariant events 2, that is, 

Q^iB):^SQiB,u) (20) 

for B d B. It remains to show that, by choosing an appropriate version, Q^j indeed 
dominates all of the P^^ o T^" (hence all of the Pn,uj) as well as P^. This is established 
by the following lemma whose merely technical proof has been deferred to appendix A. 

Lemma 6. 

a{B,u;) := i(P^(P) + ^ 2-"-ip^(T-»P)) (21) 

is a version of the regular conditional probability ofQ given I. 

Remark 3. In order to achieve that Q^^ dominates all of the P^oT'"^ and P^ one could 
have defined Q^ directly via (21). However, the observation that Q,^ is induced by the 
regular conditional probability of Q given I is crucial for step 3. 
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6.3 Step 2 

Construction of positive contractions Ucj on Li{Q^j) is achieved by, mutatis mutandis, 
reiterating the arguments accompanying the construction of U in the proof of lemma 4. 
In more detail, we replace P, P„, P, Q, /„, / there by P^, P^, Pn,uiQuj, /n,w, /w (we 
recall (13), (15), (16), (20), (17) for the latter definitions) here. Note that choosing the ver- 
sion of Q^ according to lemma 6 ensures that U^ indeed maps Li{Q^) onto Li{Q^). 
(18) and (19) then are a direct consequence of the definition of U^i- Finally, appli- 
cation of Krengel's theorem 3 to the positive contraction U^ on Li{Q^) yields a C^j- 
invariant /* to which the /„_;j converge stochastically. Moreover, again by Krengel's 
theorem, 

/:=liminf/„,^ inLi(Q^). (22) 

6.4 Step 3 

We have to show that 

f: = f^ inLi(g^) 

for uj in an invariant set E C E with Q{E) = 1. In a first step, the following lemma 
will provide as with a useful invariant E* where E C E* C E and Q{E*) ~ 1. We 
further recall the definitions of /„ and / as the densities of P„ and P w.rt. Q (see (7)). 
Without loss of generality, we choose representatives that are everywhere nonnegative. 
Due to lemma 4, 

liminf/„ = / inLi(Q). (23) 

n — *oo 

Lemma 7. There is an invariant set E* with P{E*) = Q[E*) = 1 such that, for 

ujeE*, 

lim inf /„ = lim inf /„_^ in Li{Q^) (24) 

n — voo n — voo 

and 

f = f^ iriLiiQ^). (25) 

Proof. We have deferred the merely technical proof to appendix B. o 

We compute 

E* J Je' J "^°° 



/ |liminf/„-/|dQ* = '0 
Je' "^°° 



where (*) follows from the defining properties of the conditional expectation 

£'(1 liminfn^oo fn — f\ I 2^) in combination with theorem 4. According to the last 
computation, we find a set i? C E* with Q{E) ~ 1 such that 



toeE =^ J\f:-U\dQ^^O. 
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The invariance of the regular conditional probabiUties (see (14)) involved in the defini- 
tions of /*, Jlj implies 

J\f:-UdQ^=0 <^ J\fT^-fT^\dQTu.^O. 
This translates to that E is invariant such that E meets the requirements of theorem 1 . o 

7 Discussion 

We have demonstrated how to decompose AMS random sources, which encompass a 
large variety of sources of practical interest, into ergodic components. The result comes 
in the tradition of the ergodic decomposition of stationary sources. As outlined in the 
introduction, this substantially added to source coding theory by facilitating the gener- 
alization of a variety of prominent theorems to arbitrary, not necessarily ergodic, sta- 
tionary sources. 

Our result can be expected to yield similar contributions to the theory of AMS 
sources. An immediate clue is that the theorems developed in [10] for two-sided AMS 
sources are now valid for arbitrary AMS sources by replacing theorem 2.6 there by 
theorem 1 here. 

Moreover, a couple of relevant quantities in information theory (e.g. entropy rate) 
are affine functionals that are upper semicontinuous w.rt. the space of stationary random 
sources, equipped with the weak topology. Jacobs' theory of such functionals ([17], see 
also [5], th. 4) immediately builds on the ergodic decomposition of stationary sources. 
This theory should now be extendable to AMS sources. 

We finally would like to mention that a certain class of source coding theorems 
for AMS sources were obtained by partially circumventing the lack of an ergodic de- 
composition. Schematically, this was done by a reduction from AMS sources to their 
stationary means and subsequent application of the ergodic decomposition for station- 
ary sources in order to further reduce to ergodic sources. In these cases, our contribution 
would only be to simplify the theorems' statements and thus a merely esthetical one. 
However, in the remaining cases where the reduction from asymptotic mean stationar- 
ity to stationarity is not applicable, our result will be essential. The full exploration of 
related consequences seems to be a worthwhile undertaking. 
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A Proof of lemma 6 

In the following, according to the assumptions of theorem 1, P is a measure on a stan- 
dard space (J?, B) that is AMS relative to the measurable T : il -^ il. We further recall 
the notations of section 2 as well as equations (5) and (6) for the necessary definitions. 
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Lemma 8. Let g : fi ^r M. be a T-invariant (that is, g{uj) = g{TLu) for all lo G fi), 
measurable function. Then it holds that 

J gdP = J gdiPoT-'^)^ J gdPn^ JgdP = JgdQ. (26) 

In particular, all of the integrals exist if one of the integrals exists. 

Proof Note that Q and all of the P o T"" and P„, Uke P, are AMS with stationary 
mean P, which is an obvious consequence of their definitions. Therefore, the claim of 
the lemma follows from the, intuitively obvious, observation that J g dP ^ J g dP for 
invariant g and general AMS P with stationary mean P. See [12] for details. o 

Lemma 9. The functions 

are versions of the regular conditional probabilities Sp^x-^ of the P o y^" given T. 

Proof. The functions C„(., lo) are probability measures for fixed w G i7 (this is (RCP) 
of definition 4) as the P^ are, by the definition of Sp. Again by the definition of Sp, 
Cn{B, .) is also X-measurable in uj for fixed B E B. which is (CPl) of definition 4. 
For J e X and B & B we compute 

f 5p{T-"B, cj) d{P o T-''){lu) ^^"^y^^' f SpiT-^'B, uj) dP{u) 

= P{i n T-"s) '^''U=^ p{T-"{i n B)) 

Sp,T-4B,u;)d{PoT-"){co) 

where the first equation follows from the invariance of the integrands and lemma 8. We 
have thus shown (CP2) of definition 4. o 

We recall that, for lemma 6, we have to show that 

a{B,^) = i(P^(B) + ^ 2-"-ip^(r-"P)) 

n>0 

is a version of the regular conditional probability 5q. Note first that P^, according to 
our proof strategy oudined in subsection 6.1, was defined as 5p{.,uj) where Sp is the 
regular conditional probability of the stationary mean P. Furthermore, as a consequence 
of lemma 9, we can identify the P^ o T~" with SpoT-" {■, ^) and write 

a{B, t^) = i {Sp{B, uj) + Y, 2-"-i<5poT-" {B, to)). (27) 

n>0 

We will then exploit the defining properties of the 5s to finally show that a is a version 

Of^Q. 
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Proof of lemma 6. We have to check properties (RCP), (CPl) and (CP2) of defini- 
tion 4. 

{RCP) : Thata(.,cj) is a probabiHty measure for fixed a; follows from an argumen- 
tation which is completely analogous to that at the beginning of section 4, surrounding 
equations (4) and (6). 

(CPl) ; As all of the (5's involved in (27) are invariant in uj (see (14)), we know 
that a{B, .) is measurable w. r t. 2 for any B <E B which is (CPl) of definition 4. 

(CP2) ; Fix B E B and consider the functions 

1 " 

This is an increasing sequence of non-negative measurements which converges every- 
where to the values a(P,cj). Because of (14) the summands of <;„ are invariant. As all 
of the summands are also integrable with respect to some P o T^^ or P they are also 
integrable with respect to Q, due to lemma 8. Therefore, also the <?„ are integrable with 
respect to Q. The monotone convergence theorem of Beppo Levi (e.g. [15]) reveals that 
also a{B, .) is and further, for / e X and B ^ B: 

f a{B,uj)dQ{u;)= f lim l{6p{B,u;) + Y 2-''-^Sp,T-^^{B,iu))dQ{uj) 

J I Jjn^oo2 ^^ 

^=^ lim -{5p{B,u)+y^2-^~^5p^r-.{B,u))dQ{oj) 

■'-' fc=0 

'=^ lim lif 5p{B,Lu)dP{uj) 

■>i — >oo 2 J f 

n „ 

+ E2"'"' / 5p,T->'{B,w)d{PoT-''){u)) 
fe=0 "^^ 

1 " 

^^ lim -(P(/nB)+y 2-'=-ip(T-'=(/nB)) 

n — 'oo 2 — ^ 

= i(P(/ n B) + y 2-"-^p(r-"(/ n b))) 

Tl>0 

= Q(/nP) 

where (a) follows from Beppo Levi's theorem, (6) follows from the invariance of the 
5s and subsequent application of lemma 8 and (c) is just the defining property (CP2) 
of the conditional probabilities 5 (definition 4). We thus have shown property (CP2) 
for a. O 
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B Proof of Lemma 7 

According to the assumptions of theorem 1, P is a measure on a standard space (i7, B) 
that is AMS relative to the measurable T : il ^ il. We further recall the notations of 
section 2 as well as equations (5), (6), (7), (13), (15), (16), (17), (20) and the surrounding 
texts for the necessary definitions. We further remind that, without loss of generality, 
we had chosen representatives of the /„ and / that are everywhere nonnegative. The 
following lemma will deliver the technical key to lemma 7. 

Lemma 10. For each 1 < n G N there is an invariant En G T C B with P„(£'„) = 
Q{E„) ~ 1 such that 

There is also an invariant E^o with P{Eoo) = Q{Eoo) = 1 such that 
UJ € Eoo ==^ foj = f in Li{Q^). 

Loosely speaking, the lemma reveals that the /„ and the /„ ^^ as well as / and /^^ 
agree Q^^-a.e, for Q-almost all uj G f2. This means that, for Q-almost all uj, they are 
equal on the parts of i? considered relevant by the measures Q^^. 

Proof. Consider the functions 

(3n{B,uj):= / Jn.ujdQ^ and 7„(S,w) := / /„ dQ^ 

J B J B 

By the definition of a density. 



5p^{B,uj) = / fn.udQ^. 

J B 

Hence /3„ (B, w) is just the regular conditional probability of P„ given I. We now show 
that 7„ is a version of the conditional probability of P„ given I (but not necessarily 
a regular one). Note first that the -jniB, .) are Z-measurable as, according to (11), we 
have that jn{B,uj) agrees with the conditional expection i?Q(lB/„|X)(w), which, by 
definition, is T-measurable. Second, we observe that, for I E T and B G B, as 7„ is 
invariant in cu (*), 



= [if fndQ^)dQ 

= fif lBfndQ^)dQ 



* = ' flBfndQ= f fndQ 

Jl JlnB 

^PniinB), 

which shows the required property (CP2) of definition 4. Hence the 7„'s are versions 
of the conditional probabilities of the P„'s given T. 
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Note that the 7„(., w) are measures because the /„ had been chosen nonnegative 
everywhere. If we follow the line of argumentation of lemma 5 we find a set En of P„- 
measure one such that the measures /3„(., cj) and 7ri(., cj) agree for u; G £'„. Because 
of the invariance of /3„, 7„ the set En is invariant. Hence (lemma 8) also Q{En) = 1. 
Resuming we have 



uj eEn =^ VB e B : fn dQ^ = / /„,^ dQ^. 

JB J B 

As two functions agree almost everywhere if their integrals conincide over arbitrary 
events, we are done with the assertion of the lemma for the /„. 

We find an invariant set E^c with P{Eoc) = Q{Eoo) ~ 1 such that 

fu, = .f inLi{Qu;) 
for uj G Eoc by a completely analogous argumentation . o 

Proof of lemma 7. Define 

S* := -Eoo n ( fl E„) (28) 

n>l 

with Eoo and the £'„ from lemma 10. E* is invariant and Q{E*) = 1 as it applies to all 
sets on the right hand side of (28). We obtain 

Vn G N /„ = fn,u; and / = /^ in Li{Q^) 

for uj <E E*. Therefore also 

lim inf /„ = lim inf fn^u in L i (Qw ) 

n — >oo n — 'oo 

foroj G E*. o 
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